Dylan Foster
We consider the fundamental problem of inferring a hidden “ground truth” labeling for the vertices of a graph G using noisy pairwise measurements, which captures basic inference tasks in structured prediction and community detection. We show that given access to "side information", in the form of vertex measurements weakly correlated with the ground truth, our prospects for approximate recovery in this setting improve dramatically. In trees, where approximate recovery without side information is impossible, we provide a simple, efficient, and optimal algorithm that infers the ground truth with low Hamming error and implies non-trivial recovery rates for all connected graphs. For general graphs we provide a new characterization of which graph structures admit fast rates for approximate recovery in terms of tree decompositions and local cut structure, and provide an algorithm for efficient inference on these graphs.
Remarkably, our approach achieves approximate recovery on graphs that fail to satisfy even weak expansion properties: On the 3 × n grid graph we achieve Hamming error of $\Theta(p^2*n)$ (where p is the edge noise level) with high probability. This improves on Globerson et al. 2015 who achieve the same rate only on 2D grids that are close enough to square to expand weakly. Our techniques apply to several other classes of graphs, where we provide matching upper and lower bounds. The thrust of our analysis is to 1) use a tree decomposition along with edge measurements to produce a small class of viable vertex labelings and 2) apply fast rate results from statistical learning theory to show that we can infer the ground truth from this class using vertex measurements.
Joint work with Daniel Reichman and Karthik Sridharan.